Search CORE

315 research outputs found

Self-Attention Networks for Connectionist Temporal Classification in Speech Recognition

Author: Huang Zhiheng
Kirchhoff Katrin
Salazar Julian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/02/2019
Field of study

The success of self-attention in NLP has led to recent applications in end-to-end encoder-decoder architectures for speech recognition. Separately, connectionist temporal classification (CTC) has matured as an alignment-free, non-autoregressive approach to sequence transduction, either by itself or in various multitask and decoding frameworks. We propose SAN-CTC, a deep, fully self-attentional network for CTC, and show it is tractable and competitive for end-to-end speech recognition. SAN-CTC trains quickly and outperforms existing CTC models and most encoder-decoder models, with character error rates (CERs) of 4.7% in 1 day on WSJ eval92 and 2.8% in 1 week on LibriSpeech test-clean, with a fixed architecture and one GPU. Similar improvements hold for WERs after LM decoding. We motivate the architecture for speech, evaluate position and downsampling approaches, and explore how label alphabets (character, phoneme, subword) affect attention heads and performance.Comment: Accepted to ICASSP 201

arXiv.org e-Print Archive

Crossref

Transformation Based Interpolation with Generalized Representative Values

Author: Huang Zhiheng
Shen Qiang
Publication venue
Publication date: 01/01/2005
Field of study

Fuzzy interpolation offers the potential to model problems with sparse rule bases, as opposed to dense rule bases deployed in traditional fuzzy systems. It thus supports the simplification of complex fuzzy models and facilitates inferences when only limited knowledge is available. This paper first introduces the general concept of representative values (RVs), and then uses it to present an interpolative reasoning method which can be used to interpolate fuzzy rules involving arbitrary polygonal fuzzy sets, by means of scale and move transformations. Various interpolation results over different RV implementations are illustrated to show the flexibility and diversity of this method. A realistic application shows that the interpolation-based inference can outperform the conventional inferences

CiteSeerX

Aberystwyth Research Portal

Edinburgh Research Archive

Preserving piece-wise linearity in fuzzy interpolation

Author: Huang Zhiheng
Shen Qiang
Publication venue: IEEE Press
Publication date: 01/08/2009
Field of study

Aberystwyth Research Portal

Fuzzy interpolative reasoning via scale and move transformation

Author: Huang Zhiheng
Shen Qiang
Publication venue
Publication date: 01/01/2006
Field of study

Interpolative reasoning does not only help reduce the complexity of fuzzy models but also makes inference in sparse rule-based systems possible. This paper presents an interpolative reasoning method by means of scale and move transformations. It can be used to interpolate fuzzy rules involving complex polygon, Gaussian or other bell-shaped fuzzy membership functions. The method works by first constructing a new inference rule via manipulating two given adjacent rules, and then by using scale and move transformations to convert the intermediate inference results into the final derived conclusions. This method has three advantages thanks to the proposed transformations: 1) it can handle interpolation of multiple antecedent variables with simple computation; 2) it guarantees the uniqueness as well as normality and convexity of the resulting interpolated fuzzy sets; and 3) it suggests a variety of definitions for representative values, providing a degree of freedom to meet different requirements. Comparative experimental studies are provided to demonstrate the potential of this method

Aberystwyth Research Portal

Edinburgh Research Archive

Fuzzy interpolation with generalized representative values

Author: Huang Zhiheng
Shen Qiang
Publication venue
Publication date: 01/01/2004
Field of study

Fuzzy interpolative reasoning offers the potential to model problems using sparse rule bases, as opposed to dense rule bases deployed in traditional fuzzy systems. It thus supports the simplification of complex fuzzy models in terms of rule number and facilitates inferences when limited knowledge is available. This paper presents an interpolative reasoning method by means of scale and move transformations

CiteSeerX

Aberystwyth Research Portal

Edinburgh Research Archive

Scale and move transformation-based fuzzy interpolative reasoning:A revisit

Author: Huang Zhiheng
Shen Qiang
Publication venue
Publication date: 01/01/2004
Field of study

This paper generalises the previously proposed interpolative reasoning method 151 to cover interpolations involving complex polygon, Gaussian or other bell-shaped fuzzy membership functions. This can be achieved by the generality of the proposed scale and move transformations. The method works by first constructing a new inference rule via manipulating two given adjacent rules, and then by using scale and move transformations to convert the intermediate inference results into the final derived conclusions. This generalised method has two advantages thanks to the elegantly proposed transformations: I) It can easily handle interpolation of multiple antecedent variables with simple computation; and 2) It guarantees the uniqueness as well as normality and convexity of the resulting interpolated fuzzy sets. Numerical examples are provided to demonstrate the use of this method

CiteSeerX

Aberystwyth Research Portal

Edinburgh Research Archive

Fuzzy Interpolation and Extrapolation: A Practical Approach

Author: Huang Zhiheng
Shen Qiang
Publication venue
Publication date: 26/02/2008
Field of study

Aberystwyth Research Portal

Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

Author: Huang Zhiheng
Mao Junhua
Wang Jiang
Xu Wei
Yang Yi
Yuille Alan
Publication venue
Publication date: 01/01/2015
Field of study

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. It directly models the probability distribution of generating a word given previous words and an image. Image captions are generated by sampling from this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on four benchmark datasets (IAPR TC-12, Flickr 8K, Flickr 30K and MS COCO). Our model outperforms the state-of-the-art methods. In addition, we apply the m-RNN model to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval. The project page of this work is: www.stat.ucla.edu/~junhua.mao/m-RNN.html .Comment: Add a simple strategy to boost the performance of image captioning task significantly. More details are shown in Section 8 of the paper. The code and related data are available at https://github.com/mjhucla/mRNN-CR ;. arXiv admin note: substantial text overlap with arXiv:1410.109

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT